(19) 



Europaisch s Patentamt 
Europ an Patent Office 
Office urop ' n des brevets 



(12) 



(ID EP 1 172 994 A2 

EUROPEAN PATENT APPLICATION 



(43) Date of publication: 

16.01.2002 Bulletin 2002/03 

(21) Application number: 01203574.7 

(22) Date of filing: 25.10.1995 



(51) Intel 7: H04M 3/493, G10L 15/24, 
G10L 15/26, G10L 17/00, 
G10L 15/18 



(84) Designated Contracting States: 

BE CH DE DK ES FR GB IT LI NL PT SE 

(30) Priority: 25.10.1994 EP 94307843 

(62) Document number(s) of the earlier application(s) in 
accordance with Art. 76 EPC: 
95934749.3 / 0 800 698 

(71) Applicant: BRITISH TELECOMMUNICATIONS 
public limited company 
London EC1A 7AJ (GB) 



(72) Inventor: The designation of the inventor has not 
yet been filed 

(74) Representative: Lloyd, Barry George William et al 
BT Group Legal Services, Intellectual Property 
Department, 8th Floor, Holborn Centre, 120 
Holborn 

London EC1N 2TE (GB) 

Remarks: 

This application was filed on 20 - 09 - 2001 as a 
divisional application to the application mentioned 
under INID code 62. 



(54) Voice-operated services 



(57) A method and apparatus for accessing a data- 
base where entries are linked to at least two sets of pat- 
terns. Recognition means recognise within a received 
signal one or more patterns of a first set of patterns. The 
recognised patterns are used to identify entries and 



compile a list of patterns in a second set of patterns to 
which those entries are also linked. The list is then used 
to recognise a second received signal. The received sig- 
nals may, for example, be voice signals or signals indi- 
cating the origin or destination of the received signals. 
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Description 

[0001] The present invention is concerned with auto- 
mated voice-interactive services employing speech rec- 
ognition, particularly, though not exclusively, for use 
over a telephone network. 

[0002] A typical application is an enquiry service 
where a user is asked a number of questions in order to 
elicit replies which, after recognition by a speech recog- 
nises permit access to one or more desired entries in 
an information bank. An example of this is a directory 
enquiry system in which a user, requiring the telephone 
number of a telephone subscriber, is asked to give the 
town name and road name of the subscriber's address, 
and the subscriber's surname. 

[0003] According to one aspect of the present inven- 
tion there is provided a speech recognition apparatus 
comprising a store of data containing entries to be iden- 
tified and information defining for each entry a connec- 
tion with a word of a first set of words and a connection 
with a word of a second set of words; speech recognition 
means; and control means operable: 

(a) so to control the speech recognition means as 
to identify by reference to recognition information 
for the first set of words as many words of the first 
set as meet a predetermined criterion of similarity 
to first received voice signals; 

(b) upon such identification, to compile a list of all 
words of the second set which are defined as con- 
nected with entries defined as connected also with 
the identified word(s) of the first set; and 

(c) so to control the speech recognition means as 
to identify by reference to recognition information 
for the second set of words one or more words of 
the list which resemble(s) second received voice 
signals. 

[0004] Preferably the speech recognition means is 
operable upon receipt of the first voice signal to gener- 
ate for each identified word a measure of similarity with 
the first voice signal, and the control means is operable 
to generate for each word of the list a measure obtained 
from the measure(s) for the relevant word(s) of the first 
set (i.e those identified words of the first set with which 
a word of the list has a common entry). The speech rec- 
ognition means is then operable upon receipt of the sec- 
ond voice signal to perform the identification of one or 
more words of the list in accordance with a recognition 
process weighted in dependence on the measures gen- 
erated for the words of the list. 

[0005] The apparatus may also include a store con- 
taining recognition data for all words of the second set 
and the control means is operable following the compi- 
lation of the list and before recognition of the word(s) of 
the list to mark in the recognition data store those items 
of data therein which correspond to the words not in the 
list or those which correspond to words which are in the 



list, whereby the recognition means may ignore all 
words so marked or, respectively, not marked. 
[0006] Alternatively the recognition data may be gen- 
erated dynamically either before recognition or during 

5 recognition, the control means being operable following 
the compilation of the list to generate recognition data 
for each word of the list. Methods for dynamically gen- 
erating recognition data fall outside the scope of the 
present invention but will be clear to those skilled in this 

10 art. 

[0007] Preferably the control means is operable to se- 
lect for output that entry or entries defined as connected 
both with an identified word(s) of the first set and an 
identified word of the second set. 
15 [0008] The store of data may also contain information 
defining for each entry a connection with a word of a 
third set of words, the control means being operable: 

(d) lo compile a list of all words of the third set which 
20 are defined as connected with entries each of which 
is also defined as connected both with an identified 
word of the first set and an identified word of the 
second set; and 

(c) so to control the speech recognition means as 
25 to identify by reference to stored recognition infor- 
mation for the third set of words one or more words 
of the list which resemble(s) third received voice 
signals. 

30 [0009] Furthermore, means may be included to store 
at least one of the received voice signals, the apparatus 
being arranged to perform an additional recognition 
process in which the control means is operable: 

35 (a) so to control the speech recognition means as 
to identify by reference to stored recognition infor- 
mation for the second set of words a plurality of 
words of the second set which meet a predeter- 
mined criterion of similarity to the second received 
40 voice signals; 

(b) to compile an additional list of all words of the 
first set which are defined as connected with entries 
defined as connected also with the identified words 
of the second set; and 
45 (c) so to control the speech recognition means as 
to identify by reference to stored recognition infor- 
mation for the first set of words one or more words 
of the said additional list which resemble(s) the first 
received voice signals. 

so 

[0010] Preferably the apparatus includes means to 
recognise a failure condition and to initiate the said ad- 
ditional recognition process only in the event of such fail- 
ure being recognised. 
55 [0011] The apparatus may comprise a telephone line 
connection; a speech recogniserfor recognising spoken 
words received via the telephone line connection, by ref- 
erence to recognition data representing a set of possible 
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utterances; and means responsive to receipt via the tel- 
ephone line connection of signals indicating the origin 
or destination of a telephone call to access stored infor- 
mation identifying a subset of the set of utterances and 
to restrict the recogniser operation to that subset. 
[001 2] According to a further aspect of the invention, 
a telephone apparatus comprises a telephone line con- 
nection; a speech recogniser for determining or verifying 
the identity of the speaker of spoken words received via 
the telephone line connection, by reference to recogni- 
tion data corresponding to a set of possible speakers; 
and means responsive to receipt via the telephone line 
connection of signals indicating the origin or destination 
of a telephone call to access stored information identi- 
fying a subset of the set of speakers and to restrict the 
recogniser operation to that subset. 
[0013] According to a yet further aspect of the inven- 
tion, a telephone information apparatus comprises a tel- 
ephone line connection; a speech recogniser for recog- 
nising spoken words received via the telephone line 
connection, by reference to one of a plurality of stored 
sets of recognition data; and means responsive to re- 
ceipt via the telephone line connection of signals indi- 
cating the origin or destination of a telephone call to ac- 
cess stored information identifying one of the sets of rec- 
ognition data and to supply this set to the recogniser. 
[0014] The stored sets may : for example, correspond 
to different languages or regional accents or , say, two 
of the sets may correspond to the characteristics of dif- 
ferent types of telephone apparatus, for instance the 
characteristics of a mobile telephone channel. 
According to a further aspect of the invention a recog- 
nition apparatus comprises 

a store defining a first set of patterns; 
a store defining a second set of patterns; 
a store containing entries to be identified; 
a store containing information relating each entry to 
a pattern of the first set and to a pattern of the sec- 
ond set; 

recognition means operable upon receipt of a first 
input pattern signal to identify as many patterns of 
the first set as meet a predetermined recognition cri- 
terion; 

means to generate a list of all patterns of the second 
set which are related to an entry to which an iden- 
tified pattern(s) of the first set is also related; and 
recognition means operable upon receipt of a sec- 
ond input pattern signal to identify one or more pat- 
terns of the list. 

[0015] The patterns may represent speech and the 
recognition means be a speech recogniser. 
[0016] In accordance with the invention, a speech rec- 
ognition apparatus comprises 

(i) a store of data containing entries to be identified 
and information defining for each entry a connection 



with a signal of a first set of signals and a connection 
with a word of a second set of words; 

(ii) means for identifying a received signal as corre- 
sponding to as many signals of the first set as meet 

5 a predetermined criterion; 

(iii) control means operable to compile a list of all 
words of the second set which are defined as con- 
nected with entries defined as connected also with 
the identified signal(s) of the first set; and 

10 (iv) speech recognition means operable to identify 
by reference to stored recognition information for 
the second set of words one or more words of the 
list which resemble(s) received voice signals. 

15 [0017] Preferably the first set of signals are voice sig- 
nals representing spelled versions of the words of the 
second set or initial portions thereof and the identifying 
means are formed by the speech recognition means op- 
erating by reference to stored recognition information for 

20 the said spelled voice signals. Alternatively the first set 
of signals may be signals consisting of tones and the 
identifying means is a tone recogniser. The first set of 
signals may indicate the origin or destination of the re- 
ceive signal. 

25 [0018] In accordance with a further aspect of the in- 
vention, a method of identifying entries in a store of data 
by reference to stored information defining connections 
between entries and words, comprises 

30 (a) identifying one or more of the said words as 
present in received voice signals; 

(b) compiling a list of those of the said words defined 
as connected with entries defined as connected al- 
so with the identified word(s); 

35 (c) identifying one or more of the words of the list 
as present in the received voice signals. 

[0019] In a further aspect of the invention a speech 
recognition apparatus comprises 

40 

a) a store of data containing entries to be identified 
and information defining for each entry a connection 
with at least two words; 

b) a speech recognition means able to identify by 
45 reference to stored recognition information for a de- 
fined set of words at least one word or word se- 
quence which meets some predefined criterion of 
similarity to a received voice signal; 

(c) a control means operable: 

so 

i) to compile a list of words which are defined 
as connected with entries defined as connected 
with a word previously identified by the speech 
recognition means; and 
55 ii) so to control the speech recognition means 

as to identify by reference to stored recognition 
information for the compiled list one or more 
words or word sequences which resemble a 
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further received voice signal. 

[0020] A method of speech recognition by reference 
to a stored set of words to be recognised, according to 
the invention comprises 

(a) receiving a speech signal; 

(b) storing the speech signal; 

(c) receiving a second signal; 

(d) compiling a list of words, being a subset of the 
set of words, as a function of the second signal; 

(e) applying to the stored speech signal a speech 
recognition process so as to identify by reference 
to the list one or more words of the subset. 

[0021] The second signal may also be a speech sig- 
nal, and the second signal may be recognised by refer- 
ence to recognition data representing the letters of the 
alphabet, eilher individually or as sequences. Alterna- 
tively the second signal may be a signal consisting of 
tones generated by a keypad. 

[0022] According to another aspect of the invention, 
a method of speech recognition comprises 

(a) receiving a speech signal; 

(b) storing the speech signal; 

(c) performing a recognition operation on the 
speech signal or some other signal; 

(d) in the event of the recognition operation failing 
to meet a predetermined criterion of reliability, re- 
trieving the stored speech signal and performing a 
recognition operation thereon. 

[0023] Some embodiments of the invention will now 
be described, by way of example, with reference to the 
accompanying drawings, in which: 

Figure 1 shows schematically the architecture of a 
directory enquiry system; 

Figure 2 is a flow chart illustrating the operation of 
the directory enquiry system of Figure 1 ; 
Figure 2a is a flow chart illustrating a second em- 
bodiment of operation of the directory enquiry sys- 
tem of Figure 1 ; 

Figure 3 is a flow chart illustrating the use of CLI in 
the operation of the directory enquiry system of Fig- 
ure 1 ; 

Figure 3a includes a further information gathering 
step for use in the operation of the directory enquiry 
system of Figure 1 ; 

Figure 4 is a flow chart illustrating a further mode of 
operation of the directory enquiry system of Figure 
1. 

[0024] The embodiment of the invention now to be de- 
scribed addresses the same directory enquiry task as 
was discussed in the introduction. It operates by firstly 
asking an enquirerfor a town name and, using a speech 



recogniser, identifies as "possible candidates" two or 
more possible town names. It then asks the enquirerfor 
a road name and recognition of the reply to this question 
then proceeds by reference to stored data pertaining to 

5 all road names which exist in any of the candidate towns. 
Similarly, the surname is asked for, and a recognition 
stage then employs recognition data for all candidate 
road names in candidate towns. The number of candi- 
dates retained at each stage can be fixed, or (preferably) 

10 all candidates meeting a defined acceptance criterion - 
e.g. having a recognition 'score* above a defined thresh- 
old - may be retained. 

[0025] Before describing the process in more detail, 
the architecture of a directory enquiry system will be de- 

'5 scribed with reference to Figure 1 . A speech synthesiser 
1 is provided for providing announcements to a user via 
a telephone line interface 2, by reference to stored, fixed 
messages in a message data store 3, or from variable 
information supplied to it by a main control unit 4. Incom- 

20 jng speech signals from the telephone line interface 2 
are conducted to a speech recogniser 5 which is able to 
recognise spoken words by reference to, respectively, 
town name, road name or surname recognition data in 
recognition data stores of 6, 7, 8. 

25 [0026] A main directory database 9 contains , for each 
telephone subscriber in the area covered by the direc- 
tory enquiry service, an entry containing the name, ad- 
dress and telephone number of that subscriber, in text 
form. The town name recognition data store 6 contains, 

30 in text form, the names of all the towns included in the 
directory database 9, along with stored data to enable 
the speech recogniser 5 to recognise those town names 
in the speech signal received from the telephone line 
interface 2. In principle, any type of speech recogniser 

35 may be used, but for the purposes of the present de- 
scription it is assumed that the recogniser 5. operates 
by recognising distinct phonemes in the input speech, 
which are decoded by reference to stored data in the 
store 6 representing a decoding tree structure construct- 

40 ed in advance from phonetic translations of the town 
names stored in the store 6, decoded by means of a 
Viterbi algorithm. The stores 7, 8 for road name recog- 
nition data and surname recognition data are organised 
in the same manner. Although, for example, the sur- 

^5 name recognition data store 8 contains data for all the 
surnames included in the directory database 9, it is con- 
figurable by the control unit 4 to limit the recognition 
process to only a subset of the names : typically by flag- 
ging the relevant parts of the recognition data so that 

so the "recognition tree" is restricted to recognising only 
those names within a desired subset of the names. 
[0027] This enables the 'recognition tree' to be built 
before the call commences and then manipulated during 
the call. By restricting the active subset of the tree, com- 

55 putational resources can be concentrated on those 
words which are most likely to be spoken. This reduces 
the chances that an error will occur in the recognition 
process, in those cases where one of these most likely 
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words has been spoken. 
[0028] Each entry in the town data store 6 contains, 
as mentioned above, text corresponding to each of the 
town names appearing in the database 9, to act as a 
label to link the entry in the store 6 to entries in the da- 
tabase 9 (though other kinds of label may be used if pre- 
ferred). If desired, the store 6 may contain an entry for 
every town name that the user might use to refer to ge- 
ographical locations covered by the database, whether 
or not all these names are actually present in the data- 
base. Noting that some town names are not unique 
(there are four towns in the UK called Southend), and 
that some town names carry the same significance (e. 
g. Hammersmith, which is a district of London, means 
the same as London as far as entries in that district are 
concerned), an equivalence data store 39 is also pro- 
vided, containing such equivalents, which can be con- 
sulted following each recognition of a town name, to re- 
turn additional possibilities to the set of town names con- 
sidered to be recognised. For example if "Hammer- 
smith" is recognised, London is added to the set; if 
"Southend" is recognised, then Southend-on-Sea, 
Southend (Campbeltown), Southend (Swansea) and 
Southend (Reading) arc added. 

[0029] The equivalence data store 39 could, if de- 
sired, contain similar information for roads and sur- 
names, or first names if these are used; for example 
Dave and David are considered to represent the same 
name. 

[0030] As an alternative to this structure, the vocabu- 
lary equivalence data store 39 may act as a translation 
between labels used in the name stores 6, 7, 8 and the 
labels used in the database (whether or not the labels 
are names in text form). 

[0031] The use of text to define the basic vocabulary 
of the speech recogniser requires that the recogniser 
can relate one or more textual labels to a given pronun- 
ciation. That is to say in the case of a 'recognition tree', 
each leaf in the tree may have one or more textual labels 
attached to it. If the restriction of the desired vocabulary 
of a recogniser is also defined as a textual list, then the 
recogniser should preferably return only textual labels 
in that list, not labels associated with a pronunciation 
associated with a label in the list that are not themselves 
in the list. 

[0032] The system operation is illustrated by means 
of the flowchart set out in Figure 2. The process starts 
(10) upon receipt of an incoming telephonecall signalled 
to the control unit 4 by the telephone line interface 2; the 
control unit responds by instructing the speech synthe- 
siser 1 to play (11) a message stored in the message 
store 3 requesting the caller to give the name of the re- 
quired town. The caller's response is received (12) by 
the recogniser. The recogniser 3 then performs its rec- 
ognition process (13) with reference to the data stored 
in the store 6 and communicates to the control unit 4 the 
name of the town which most clearly resembles the re- 
ceived reply or (more preferably) the names of all those 



8 

towns which meet a prescribed threshold of similarity 
with the received reply. We suppose (for the sake of this 
example) that four town names meet this criterion. The 
control unit 4 responds by instructing the speech syn- 

5 thesiser to play (14) a further message from the mes- 
sage data store 3 and meanwhile accesses (1 5) the di- 
rectory database 9 to compile a list of all road names 
which are to be found in any of the geographical loca- 
tions corresponding to those four town names and also 

10 any additional location entries obtained by accessing 
the equivalence data store 39. It then uses (16) this in- 
formation to update the road name recognition data 
store 7 so that the recogniser 3 is able to recognise only 
the road names in that list. 

15 [0033] The next stage is that a further response, re- 
lating to the road name, is received (17) from the caller 
and is processed by the recogniser 3 utilising the data 
store 7; suppose that five road names meet the recog- 
nition criterion. The conLrol unit 4 then inslrucls the play- 

20 ing (1 9) of a further message asking for the name of the 
desired telephone subscriber and meanwhile (20) re- 
trieves from the database 9 a list of the surnames of all 
subscribers residing in roads having any of the five road 
names in any of tho four geographical locations (and any 

25 equivalents), and updating the surname recognition da- 
ta store 8 in a similar manner as described above for the 
road name recognition data store. Once the user's re- 
sponse is received (22) by the recogniser, the surname 
may be recognised (23) by reference to the data in the 

30 surname recognition data store. 

[0034] It may of course be that more than one sur- 
name meets the recognition criterion; in any event, the 
database 9 may contain more than one entry for the 
same name in the same road in the same town. There- 

35 fore at step 24 the number of directory entries which 
have one of the recognised surnames and one of the 
recognised road names and one of the recognised town 
names is tested. If the number is manageable, for ex- 
ample if it is three or fewer, the control means instructs 

40 (25) the speech synthesiser to play an announcement 
from the message data store 3, followed by recitation of 
the name, address and telephone number of each entry, 
generated by the speech synthesiser 1 using text-to- 
speech synthesis, and the process is complete (26). If, 

45 on the other hand, the number of entries is excessive 
then further steps 27, to be discussed further below, will 
be necessary in order to meet the caller's enquiry. 
[0035] It will be seen that the process described will 
have a lower failure rate than a system which chooses 

so only a single candidate town, road or surname at each 
stage of the recognition process, since by retaining sec- 
ond and furtherchoice candidates the possibility of error 
due to mis-recognition is reduced though there is in- 
creased risk of recognition error due to the larger vocab- 

55 ulary. A penalty for this increased reliability is of course 
increased computation time, but by ensuring that the 
road name and surname recognition processes are con- 
ducted over only a limited number of the total number 



EP 1 172 994 A2 



5 



BNSDOCID: <EP 1 172994A2J ._> 



9 



EP 1 172 994 A2 



10 



of road names and surnames in the database, the com- 
putation can be kept to manageable proportions. 
[0036] Moreover compared with a system in which a 
second-stage recognition is unconstrained by the re- 
sults of a previous recognition (e.g. one where the 'road' 
recognition processes is not limited to roads in towns 
already recognised) the proposed system would, when 
using recognisers (such as those using Hidden Markov 
Models) which internally "prune" intermediate results, 
be less liable to prune out the desired candidate in fa- 
vour of other candidate roads from unwanted towns. 
[0037] It will be seen too, that the number of possible 
lists will, in most applications, be so large as to prohibit 
their preparation in advance, and hence the construc- 
tion of the list is performed as required. Where the rec- 
ogniser is of the type (e.g. recognisers using Hidden 
Markov models) which require setting up for a particular 
vocabulary, there are two options for updating the rele- 
vant store to limit the recogniser's operation to words in 
the list. One is to start with a fully set-up recogniser, and 
disable all the words not in the list; the other is to clear 
the relevant recognition data store and set it up afresh 
(either completely, or by adding words to a permanent 
basic set). It should be noted that some recognisers do 
not store recognition data for all words which may be 
recognised. These recognisers generally have a store 
of textual information relating to the words that may be 
recognised but do not prestore data to enable the 
speech recogniser to recognise words in a received sig- 
nal. In such so-called "dynamic recognisers" the recog- 
nition data is generated either immediately before or 
during recognition, 

[0038] The first option requires large data stores but 
is relatively inexpensive computationally for any list size. 
The second option is generally computationally expen- 
sive for large lists but requires much smaller data stores 
and is useful when there are frequent data changes. 
Generally the first option would be preferred, with the 
second option being invoked in the case of a short list, 
or where the data change frequently. 
[0039] The criterion for limiting the number of recog- 
nition 'hits' at steps 13, 1 8 or 23 may be that all candi- 
dates are retained which meet some similarity criterion, 
though other criteria such as retaining always a fixed 
number of candidates may be chosen if preferred. It may 
be, in the earlier recognition stages, that the computa- 
tional load and effect on recognition performances of re- 
taining a large town (say) with a low score is not consid- 
ered to be justified, whereas retaining a smaller town 
with the same score might be. In this case the scores of 
a recognised word may be weighted by factors depend- 
ent on the number of entries referencing that word, in 
order to achieve such differential selection. 
[0040] In the examples discussed above, a list of 
words (such as road names) to be recognised is gener- 
ated based on the results of an earlier recognition of a 
word (the town name). However it is not necessary that 
the unit in the earlier recognition step or in the list be 



single words; they could equally well be sequences of 
words. One possibility is a sequence of the names of 
the letters of the alphabet, for example a list of words 
for a town name recognition step may be prepared from 

5 an earlier recognition of the answer to the question 
"please spell the first four letters of the town name." If 
recording facilities are provided (as discussed further 
below) it is not essential that the order of recognition be 
the same as the order of receipt of the replies (it being 

10 more natural to ask for the spoken word first, followed 
by the spelled version, though it is preferred to process 
them in the opposite sequence). 

[0041] It is assumed in the above description that the 
recognisers always produce a result- i.e. that the town 

*5 (etcl name or names which give the nearest match(es) 
to the received response are deemed to have been rec- 
ognised. It would of course be possible to permit output 
of a "fail" message in the event that a reasonably accu- 
rate match was not found. In this case further action may 

20 be desired. This could simply be switching the call to a 
manual operator. Alternatively further information may 
be processed automatically as shown in figure 2a. In this 
example a low confidence match 40 has still resulted in 
four possible candidate towns. Because of the question- 
's able accuracy of this match a further message is played 
to the caller asking for an additional reply which may be 
checked against existing recognition results. In the ex- 
ample, a spelling of the town name is requested 41 al- 
lowing all permissible spellings of all town names in the 

30 recognition vocabulary. Following a confident recogni- 
tion 43 two spellings are recognised. These two town 
names may be considered more confident than the four 
spoken town names recognised previously, but a com- 
parison 44 of both lists may reveal one or more common 

35 town names in both lists. If this is so 46 then a very high 
confidence of success may be inferred for these com- 
mon town names and the enquiry may proceed, for ex- 
ample, in the same manner as Figure 2 using these 
common towns to prepare the road name recognition 

40 15. If no common town names are found then the two 
spelt towns may be retained 47 for use in the next stage 
which may be preparing the road name recogniser 15 
with the two town names as shown in the diagram, or 
may be a different processing step not shown in Figure 

45 2a, for example a confirmation of the more confident of 
the two town names with the user in order to increase 
the system confidence before a subsequent request for 
information is made. 

[0042] It is not necessary that the response to be rec- 
50 ognised be discrete responses to discrete questions. 
They could be words extracted by a recogniser from a 
continuous sentence, for systems which work in this 
way. 

[0043] Another situation in which it may be desired to 
55 vary the scope of the speech recogniser's search is 
where it can be modified on the basis not of previous 
recogniser results but of some external information rel- 
evant to the enquiry. In a directory enquiry system this 
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may be a signal indicating the origin of a telephone call, 
such as the calling line identity (CLI) or a signal identi- 
fying the originating exchange. In a simple implementa- 
tion this may be used to restrict town name recognition 
to those town names located in the same or an adjacent 5 
exchange area to that of the caller. In a more sophisti- 
cated system this identification of the calling line or ex- 
change may be used to access stored information com- 
piled to indicate the enquiry patterns of the subscriber 
in question or of subscribers in that area (as the case 
may be). 

[0044] For example, a sample of directory enquiries 
in a particular area might show that 40% of such calls 
were for numbers in the same exchange area and 20% 
for immediately adjacent areas. Separate statistical pat- 
terns might be compiled for business or residential lines, 
or for different times of day, or other observed trends 
such as global usage statistics of a service that are not 
relaLed Lo the nature or location of the originating line. 
[0045] The effect of this approach can be to improve 
the system reliability for common enquiries at the ex- 
pense of uncommon ones. Such a system thus aims to 
automate the most common or straightforward enquir- 
ies, with other calls being dealt with in an alternative 
manner, for example being routed to a human operator. 
[0046] As an example, Figure 1 additionally shows a 
CLI detector 20, (used here only to indicate the originat- 
ing exchange) which is used to select from a store 21 a 
list of likely towns for enquiries from that exchange, to 
be used by the control unit 4 to truncate the "town name" 
recognition, as indicated in the flowchart of Figure 3, 
where the calling line indicator signal is detected at step 
10a, and selects (12a) a list of town names from the 
store 21 which is then used (12b) to update the town 
name recognition store 6 prior to the town name recog- 
nition step 13. The remainder of the process is not 
shown as it is the same as that given in Figure 2. 
[0047] An extension of this approach is to improve the 
system reliability and speed for common enquiries, 
whilst using additional information to enable the less 
common enquiries to succeed. Thus the less common 
enquiries are still able to succeed but require more effort 
and information to be supplied by the caller than the 
common enquiries require. 

[0048] As an example consider Figure 3a. The spo- 
ken town name is asked for 1 1 , and the CLI is detected 
10a. As in Figure 3, the CLI is then related lo town 
names commonly requested by callers with that CLI 
identity 1 2a. These town names update the spoken town 
name store 12b. This process is identical to that shown 
in Figure 3 so far. Additionally, as the speech is gathered 
for recognition it is stored for later re-recognition 37. The 
restricted town name set used in the recognition 13 will 
typically be a small vocabulary covering a significant 
proportion of enquiries. If a word within this vocabulary 
is spoken and confidently recognised 48 then the en- 
quiry may immediately use this recognised town or 
towns to prepare the road name store and continue as 
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described in Figure 2. 

[0049] If the word is recognised as being outside of 
the vocabulary or of poor confidence then an additional 
message 49 is played to ask the caller for more infor- 
mation, which in this case is the first four letters of the 
town name. Simultaneously, an additional re-recogni- 
tion of the spoken town name 53 may be performed 
which can recognise any of the possible town names in . 
the directory. In this example we assume that four town 
names are recognised 54. At the same time, the caller 
may be spelling in the first four letters of the town name 
50 and two spellings 51 have been confidently recog- 
nised. These two spellings are then expanded to the full 
town names which match them 52. It may be necessary 
to anticipate common spelling errors, additional or miss- 
ing letters, abbreviations, and punctuation in the prepa- 
ration of the spelling vocabulary, and the subsequent 
matching of the spelt recognition results to the full town 
names. Assume in this example thai five town names 
match the two spellings. 

[0050] A comparison 55 identical in purpose to that 
described in Figure 2a (44) may then be performed be- 
tween the five town names derived from the two spell- 
ings and the four rc-recogniscd town names. If common 
words are found in these two sets, (only one common 
word is assumed in this example,) then this town name 
may confidently be assumed to be the correct one and 
the road name recognition data store 7 may be prepared 
from it and the enquiry proceeds as shown in Figure 2. 
[0051] In other cases, the spoken recognition 53 will 
be in error and no common words will be found. Alter- 
natively, the recognition of the town name 53, and its 
subsequent comparison 55, may be considered optional 
and omitted. In both of these instances the spoken town 
store will be updated 57 with the five towns derived from 
the two spellings 52 and the spoken town name re-rec- 
ognised again 58. In the example, it is assumed that a 
single confident town name was recognised. This town 
name may be used to configure the road name recog- 
nition data store 7 and the enquiry proceeds as shown 
in Figure 2. 

[0052] The deliberate restriction of a vocabulary to on- 
ly the very most likely words as described above need 
not necessarily depend on CLI. The preparation of the 
road name vocabulary based on the recognised town 
names is itself an example of this, and the approach of 
asking for additional information, as shown in Figure 3a, 
may be used if any such restricted recognition results 
are not confident. Global observed or postulated behav- 
iour can also be used to restrict a vocabulary (e.g. the 
town store) in a similar way to CLI derived information, 
as can signals indicating the destination of a call. For 
example, callers may be encouraged to dial different ac- 
cess numbers for particular information. On receipt of a 
call by a common apparatus for all the information, the 
dialled number determines the subset of the vocabulary 
to be used in subsequent operation of the apparatus. 
The operation of the apparatus would then continue sim- 
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iiarly as described above with relation to CLL 
[0053] Additionally, the re-recognition of a gathered 
word that has been constrained by additional informa- 
tion such as the four letter spelling in Figure 3a could be 
based on any kind of information, for example DTMF 
entry via the telephone keypad, or a yes/no response to 
a question restricting the scope of the search (e.g. 
"Please say yes or no: does the person live in a city?"). 
This additional information could even be derived from 
the CLI using a different area store 21 based on different 
assumptions to the previously used one. 
[0054] In the above described embodiment, no ac- 
count is taken of the relative probability of recognition, 
for example if the town recognition step 13 recognises 
town names Norwich and Harwich, then when, at road 
recognition step 18, the recogniser has to evaluate the 
possibility that the caller said "Wright Street" (which we 
suppose to be in Norwich) or "Rye Street" (in Harwich), 
no account is taken of Ihe fact that the spoken town bore 
a closer resemblance to "Norwich" than it did to "Har- 
wich". If desired however, the recogniser may be ar- 
ranged to produce (in known manner) figures or 
"scores" indicating the relative similarity of each of the 
candidates identified by the recogniser to the original 
utterance and hence the supposed probability of it being 
the correct one. These scores may then be retained 
whilst a search is made in the directory database to de- 
rive a list of the vocabulary items of the next desired 
vocabulary that are related to the recognised words. 
These new vocabulary items may then be given the 
scores that the corresponding matching word attained. 
In the case where a word came from a match with more 
than one recognised word of the previous vocabulary, 
the maximum score of the two may be selected for ex- 
ample. These scores may then be fed as a priori prob- 
abilities to the next recognition stage to bias the selec- 
tion. This may be implemented in the process depicted 
in Figure 2 as follows. 

[0055] Step 13. The recogniser produces for each 
town, a score - e.g. 

Harwich 40% 
Norwich 25% 
Nantwich 20% 
Northwich 1 5% 

[0056] Step 15. When the road list is compiled the ap- 
propriate score is appended to the road name, e.g. 

Wright Street 25% 
Rye Street 40% 

North Street (assumed to exist in both Norwich and 
Nantwich) 25% : 

and stored in the store 7. 

[0057] Step 18. When the recogniser comes to recog- 
nise the road name, it may pre-weight the recognition 
network (for example in the case of Hidden Markov Mod- 



els) with the scores from store 7. It then recognises the 
supplied word, with the resulting effect that these 
weights make the more likely words less likely to be pre- 
maturely pruned out. Alternatively, the recogniser may 

5 recognise the utterance, and adjust its resulting scores 
after recognition according to the contents of store 7. 
This second option provides no benefit to the pattern 
matching process, but both options propagate the rela- 
tive likelihood of an entry finally being selected from vo- 

10 cabulary to vocabulary. For example, considering the 
post-weighted option, if the recogniser would have as- 
signed the scores of 60%, 30% and 10% to Wright 
Street, Rye Street and North Street respectively then the 
weighted scores would be: 

15 

Wright Street (Norwich) 25% x 60% = 15% 

Rye Street (Harwich) 40% x 30% = 12% 

North Street (Norwich and Nantwich) 25% x 1 0% = 

2.5% 

20 

[0058] Similar modification would of course occur for 
the steps 20, 21, 23. This is just one example of a 
scheme for score propagation. 

[0059] The possibility of switching to a manual oper- 
as ator in the event of a "failure" condition has already been 
mentioned. Alternatively a user could simply be asked 
to repeat the action that has not been recognised. How- 
ever, further automated steps may be taken under fail- 
ure conditions. 

30 [0060] A failure condition can be identified by noting 
low recogniser output "scores", or of excessive numbers 
of recognised words all having similar scores (whether 
by reference to local scores or to weighted scores) or 
by comparing the scores with those produced by a rec- 

35 ogniser comparing the speech to out-of-vocabulary 
models. Such a failure condition may arise in an uncon- 
strained search like that of the town name recognition 
of step 1 3 in Figure 2. In this case it may be that better 
results might be obtained by performing (for example) 

40 the road name recognition step first (unconstrained) and 
compiling a list of all town names containing the roads 
found, to constrain a subsequent town name recognition 
step. Or it may arise in a constrained search such as 
that of step 13 in Figure 3 or steps 18 and 23 in Figure 

45 2, where perhaps the constraint has removed the cor- 
rect candidate from the recognition set; in this case re- 
moving the constraint - or applying a different one - may 
improve matters. 

[0061 ] Thus one possible approach is to make provi- 
so sion for recording the caller's responses, and in the 
event of failure, reprocessing them using the steps set 
out in Figure 2 (except the "play message" steps 11, 14, 
19) but with the original sequence town name/road 
name/surname modified. There are of course six per- 
55 mutations of these. One could choose that one (or more) 
of these which experience shows to be the most likely 
to produce an improvement. The result of such a reproc- 
essing could be used alone, or could be combined with 
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the previous result, choosing for output those entries 
identified by both processes. 

[0062] Another possibility is to perform an additional 
search omitting one stage, and comparing the results 
as for the 'spelled input' case. 5 
[0063] If desired, processing using two (or more) such 
sequences could be performed routinely (rather than 
only under failure conditions); to reduce delays an ad- 
ditional sequence might commence before completion 
of the first; for example (in Figure 4) an additional, un- 10 
constrained "road name" search 30 could be performed 
(without recording the road name) during the "which sur- 
name" announcement. From this, a list of surnames is 
compiled (31) and the surname store updated (32). 
Once the surnames from the list have been recognised is 
(33) a town name list may be compiled (34) and the town 
name store updated (35). Then at step 36 the spoken 
town name, previously stored at step 37 may be recog- 
nised. The resulLs of the two recognition processes may 
then be compiled, suitably be selecting (38) those en- 20 
tries which are identified by both processes. Alternative- 
ly, if no common entries are found, the entries found by 
one or the other or both of the processes may be used. 
The remaining steps shown in Figure 4 are identical to 
those in Figure 2. 25 
[0064] The technique of storing an utterance and us- 
ing it in a restricted-vocabulary recognition process fol- 
lowing recognition of a later utterance has been de- 
scribed as an option to be used alongside sequential 
processing, as a cross-check or to provide additional 30 
recognition results to be used in the case of difficulty. 
However, it may be used alone, for example in circum- 
stances where one chooses to have the questions 
asked in a sequence which seem natural to the user, so 
as to improve speed and reliability of response, but to 35 
process the answers in a sequence which is more suited 
to the nature of the data. For example in Figure 4, the 
right hand branch only could be used (but with steps 1 4, 

17, 19 and 22 retained to feed it) -i.e. omit steps 15,16, 

18, 20,21,23,38. 40 
[0065] The use of CLI to modify the expectations of a 
speech service need not be restricted to the modification 

of expected vocabulary items as already described. En- 
quiry systems that require a certain level of security or 
personal identification may also use CLI to their advan- «s 
tage. The origin of the telephone call as given by the CLI 
may be used to extract from a store the identity of a 
number of individuals known to the system to be related 
to this origin. This store may also contain representative 
speech which is already verified to have come from 50 
these individuals. If there is only one individual author- 
ised to access the given service from the designated or- 
igin, or the caller has made a specific claim to identity 
by means of additional information (e.g. a DTMF or spo- 
ken personal identification number) then a spoken ut- 55 
terance may be gathered from the caller and compared 
with the stored speech patterns associated with that 
claimed identity in order to verify that the person is who 
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they say that they are. Alternatively, if there are a 
number of individuals associated with the call origin, the 
identity of the caller may be determined by gathering a 
spoken utterance from the caller and comparing it with 
stored speech patterns for each of the individuals in turn, 
selecting the most likely candidate that matches with a 
certain degree of confidence. 

[0066] The CLI may also be used to access a store 
relating speech recognition models to the origin of the 
call. These speech models may then be loaded into the 
stores used by the speech recogniser. Thus, a call orig- 
inating from a cellular telephone, for example, may be 
dealt with using speech recognition models trained us- 
ing cellular speech data. A similar benefit may be de- 
rived for regional accents or different languages in a 
speech recognition system. 



Claims 

1. A telephone information apparatus comprising a tel- 
ephone line connection; a speech recogniser for 
recognising spoken words received via the tele- 
phone line connection, by reference to recognition 
data representing a set of possible utterances; and 
means responsive to receipt via the telephone line 
connection of signals indicating the origin of a tele- 
phone call to access stored information identifying 
a subset of the set of utterances and to restrict the 
recogniser operation to that subset. 

2. A telephone information apparatus comprising a tel- 
ephone line connection; a speech recogniser for 
recognising spoken words received via the tele- 
phone tine connection, by reference to recognition 
data representing a set of possible utterances; and 
means responsive to receipt via the telephone line 
connection of signals indicating the destination of a 
telephone call to access stored information identi- 
fying a subset of the set of utterances and to restrict 
the recogniser operation to that subset. 

3. Apparatus according to Claim 1 or 2, in which the 
apparatus includes a store containing recognition 
data for all words of the set and the control means 
is operable to mark in the recognition data store 
those items of data therein which correspond to the 
words not in the subset or those which correspond 
to words which are in the subset, whereby the rec- 
ognition means may ignore all words so marked or, 
respectively, not marked. 

4. Apparatus according to Claim 1 or 2, in which the 
control means is operable to generate recognition 
data for each word of the subset. 

5. A telephone apparatus comprising a telephone line 
connection; a speech recogniser for determining or 
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verifying the identity of the speaker of spoken words 
received via the telephone line connection, by ref- 
erence to recognition data corresponding to a set 
of possible speakers; and means responsive to re- 
ceipt via the telephone line connection of signals in- 
dicating the origin of a telephone call to access 
stored information identifying a subset of the set of 
speakers and to restrict the recogniser operation to 
that subset. 

6. A telephone apparatus comprising a telephone line 
connection; a speech recogniser for determining or 
verifying the identity of the speaker of spoken words 
received via the telephone line connection, by ref- 
erence to recognition data corresponding to a set 
of possible speakers; and means responsive to re- 
ceipt via the telephone line connection of signals in- 
dicating the destination of a telephone call to access 
stored information identifying a subset of the set of 
speakers and to restrict the recogniser operation to 
that subset. 

7. A telephone information apparatus comprising a tel- 
ephone line connection; a speech recogniser for 
recognising spoken words received via the tele- 
phone line connection, by reference to one of a plu- 
rality of stored sets of recognition data; and means 
responsive to receipt via the telephone line connec- 
tion of signals indicating the origin of a telephone 
call to access stored information identifying one of 
the sets of recognition data and to supply this set to 
the recogniser. 

8. A telephone information apparatus comprisi ng a tel- 
ephone line connection; a speech recogniser for 
recognising spoken words received via the tele- 
phone line connection, by reference to one of a plu- 
rality of stored sets of recognition data; and means 
responsive to receipt via the telephone line connec- 
tion of signals indicating the destination of a tele- 
phone call to access stored information identifying 
one of the sets of recognition data and to supply this 
set to the recogniser. 

9. A telephone information apparatus according to 
Claim 7 or 8 in which the stored sets correspond to 
different languages or regional accents. 

10. A telephone information apparatus according to 
Claim 7 or 8 in which at least two of the sets corre- 
spond to the characteristics of different types of tel- 
ephone apparatus. 

11. A telephone information apparatus according to 
Claim 10 in which one of the sets corresponds to 
the characteristics of a mobile telephone channel. 

12. A speech recognition apparatus comprising: 
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a store defining a first set of words; 
a store defining a second set of words; 
a store containing entries to be identified: 
a store containing information relating each en- 
try to a word of the first set and to a word of the 
second set; 

speech recognition means operable upon re- 
ceipt of a first voice signal to identify as many 
words of the first set as meet a predetermined 
recognition criterion; 

means to generate a list of all words of the sec- 
ond set which are related to an entry to which 
the identified word(s) of the first set is also re- 
lated; and 

speech recognition means operable upon re- 
ceipt of a second voice signal to identify one or 
more words of the list. 

13. A recognition apparatus comprising: 

a store defining a first set of patterns; 
a store defining a second set of patterns; 
a store containing entries to be identified; 
a store containing information relating each en- 
try to a pattern of the first set and to a pattern 
of the second set; 

recognition means operable upon receipt of a 
first input pattern signal to identify as many pat- 
terns of the first set as meet a predetermined 
recognition criterion; 

means to generate a list of all patterns of the 
second set which are related to an entry to 
which an identified pattern(s) of the first set is 
also related; and recognition means operable 
upon receipt of a second input pattern signal to 
identify one or more patterns of the list. 

14. A method of identifying entries in a store of data by 
reference to stored information defining connec- 
tions between entries and words, comprising: 



(a) identifying one or more of the said words as 
present in received voice signals; 

(b) compiling a list of those of the said words 
45 defined as connected with entries defined as 

connected also with the identified word(s); 

(c) identifying one or more of the words of the 
list as present in the received voice signals. 

so -|5. A speech recognition apparatus comprising: 

a) a store of data containing entries to be iden- 
tified and information defining for each entry a 
connection with at least two words; 
55 b) a speech recognition means able to identify 

by reference to stored recognition information 
for a defined set of words at least one word or 
word sequence which meets some predefined 
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criterion of similarity to a received voice signal; 
(c) a control means operable: 

i) to compile a list of words which are de- 
fined as connected with entries defined as 5 
connected with a word previously identified 
by the speech recognition means; and 
ti) so to control the speech recognition 
means as to identify by reference to recog- 
nition information for the compiled list one 
or more words or word sequences which 
resemble a further received voice signal. 

16. A method of speech recognition comprising: 

15 

(a) receiving a speech signal; 

(b) storing the speech signal; 

(c) performing a recognition operation on the 
speech signal or some other signal; 

(d) in the event of the recognition operation fail- 2 o 
ing to meet a predetermined criterion of relia- 
bility, retrieving the stored speech signal and 
performing a recognition operation thereon. 
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